Support PaddlePaddle with compatible API#1642
Support PaddlePaddle with compatible API#1642SigureMo wants to merge 8 commits intoflashinfer-ai:mainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @SigureMo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request implements a strategic compatibility layer to bridge PyTorch-dependent projects with PaddlePaddle. Its primary goal is to facilitate the adoption of libraries like FlashInfer within the PaddlePaddle ecosystem by providing a seamless, opt-in mechanism that intelligently adapts build processes and API calls without disrupting existing workflows. The changes are designed to be non-invasive and environment-gated, ensuring flexibility and stability for both PyTorch and PaddlePaddle users.
Highlights
- PaddlePaddle Compatibility: Introduces a minimal, opt-in compatibility layer to enable PyTorch ecosystem libraries, such as FlashInfer, to run on PaddlePaddle with minimal code changes.
- Design Approach: The compatibility is achieved through a C++/CUDA layer adapter for PyTorch's C API surface (ATen/c10/torch), a Python layer to match PyTorch's API shape (allowing
import paddle as torch), and an import proxy (paddle.compat.enable_torch_proxy()) that makesimport torchloadpaddle. - Opt-in Mechanism: The compatibility features are activated by setting the
PADDLE_COMPATIBLE_APIenvironment variable, ensuring that default behavior remains unchanged for regular PyTorch or Paddle users. - Build System Adjustments: Modifies the JIT compilation logic and
setup.pyto conditionally include PaddlePaddle-specific include paths and linker flags, or PyTorch ones, based on the compatibility mode. - Initial Testing: Successfully tested the
flashinfer.fused_moe.cutlass_fused_moeinterface with fp16 unit tests under the new compatibility mode.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a compatibility layer for PaddlePaddle, allowing third-party PyTorch ecosystem libraries to run on Paddle with minimal changes. The changes involve adapting C++/CUDA APIs and reorganizing the Python API to match PyTorch's structure. The integration is opt-in, controlled by the PADDLE_COMPATIBLE_API environment variable. The review focuses on correctness and maintainability, particularly concerning the conditional logic for PaddlePaddle compatibility and the modifications to the build process.
| if enable_aot: | ||
| if use_paddle_compatible_api(): | ||
| import paddle |
There was a problem hiding this comment.
Consider adding a check to ensure paddle is importable before calling paddle.compat.enable_torch_proxy(). If paddle is not installed or available in the environment, this could lead to an ImportError and break the build process. A try...except block can be used to handle this scenario gracefully.
| if enable_aot: | |
| if use_paddle_compatible_api(): | |
| import paddle | |
| if use_paddle_compatible_api(): | |
| try: | |
| import paddle | |
| paddle.compat.enable_torch_proxy() | |
| except ImportError: | |
| print("PaddlePaddle is not installed. Skipping paddle.compat.enable_torch_proxy().") |
| if use_paddle_compatible_api(): | ||
| system_includes.extend( | ||
| [ | ||
| "$torch_home/include", | ||
| "$torch_home/include/torch/csrc/api/include", | ||
| ] | ||
| ) |
There was a problem hiding this comment.
The system_includes.extend method is called within the if block, but the system_includes list already contains some default paths. This could lead to duplicate include paths if use_paddle_compatible_api() returns True, potentially causing issues during compilation. Consider adding the default paths within the else block to avoid duplication.
Alternatively, you can initialize system_includes as an empty list and populate it entirely within the if and else blocks to ensure no overlap.
system_includes = [
sysconfig.get_path("include"),
"$cuda_home/include",
jit_env.FLASHINFER_INCLUDE_DIR.resolve(),
jit_env.FLASHINFER_CSRC_DIR.resolve(),
]
if use_paddle_compatible_api():
system_includes.extend(
[
"$torch_home/include",
"$torch_home/include/torch/csrc/api/include",
]
)
else:
system_includes.extend(
[
"$torch_home/include",
"$torch_home/include/paddle/phi/api/include/compat",
"$torch_home/include/paddle/phi/api/include/compat/torch/csrc/api/include",
]
)| ldflags = [ | ||
| "-shared", | ||
| "-L$torch_home/lib", | ||
| "-L$cuda_home/lib64", | ||
| "-lc10", | ||
| "-lc10_cuda", | ||
| "-ltorch_cpu", | ||
| "-ltorch_cuda", | ||
| "-ltorch", | ||
| "-lcudart", | ||
| ] |
There was a problem hiding this comment.
The ldflags list is initialized with "-shared" and "-lcudart" regardless of the use_paddle_compatible_api() condition. This could lead to redundancy or conflicts if the subsequent extend calls also include -shared or -lcudart. Consider initializing ldflags as an empty list and adding these flags conditionally within the if and else blocks to avoid potential issues.
ldflags = []
if use_paddle_compatible_api():
ldflags.extend(
[
"-shared",
"-L$torch_home/lib",
"-L$cuda_home/lib64",
"-lc10",
"-lc10_cuda",
"-ltorch_cpu",
"-ltorch_cuda",
"-ltorch",
"-lcudart",
]
)
else:
ldflags.extend(
[
"-shared",
"-L$torch_home/libs",
"-L$torch_home/base",
"-L$cuda_home/lib64",
"-lpaddle",
"-lphi",
"-lphi_core",
"-lphi_gpu",
"-lcommon",
"-lcudart",
]
)| return flashinfer.jit.gen_jit_spec( | ||
| "logging", | ||
| [ | ||
| jit_env.FLASHINFER_CSRC_DIR / "logging.cc", | ||
| flashinfer.jit.env.FLASHINFER_CSRC_DIR / "logging.cc", | ||
| ], | ||
| extra_include_paths=[ | ||
| jit_env.SPDLOG_INCLUDE_DIR, | ||
| jit_env.FLASHINFER_INCLUDE_DIR, | ||
| flashinfer.jit.env.SPDLOG_INCLUDE_DIR, | ||
| flashinfer.jit.env.FLASHINFER_INCLUDE_DIR, | ||
| ], | ||
| ).build_and_load() |
There was a problem hiding this comment.
Consider using pathlib.Path.joinpath instead of / for constructing paths. This is more platform-independent and readable. For example, flashinfer.jit.env.FLASHINFER_CSRC_DIR.joinpath("logging.cc").
| return flashinfer.jit.gen_jit_spec( | |
| "logging", | |
| [ | |
| jit_env.FLASHINFER_CSRC_DIR / "logging.cc", | |
| flashinfer.jit.env.FLASHINFER_CSRC_DIR / "logging.cc", | |
| ], | |
| extra_include_paths=[ | |
| jit_env.SPDLOG_INCLUDE_DIR, | |
| jit_env.FLASHINFER_INCLUDE_DIR, | |
| flashinfer.jit.env.SPDLOG_INCLUDE_DIR, | |
| flashinfer.jit.env.FLASHINFER_INCLUDE_DIR, | |
| ], | |
| ).build_and_load() | |
| return flashinfer.jit.gen_jit_spec( | |
| "logging", | |
| [ | |
| flashinfer.jit.env.FLASHINFER_CSRC_DIR.joinpath("logging.cc"), | |
| ], | |
| extra_include_paths=[ | |
| flashinfer.jit.env.SPDLOG_INCLUDE_DIR, | |
| flashinfer.jit.env.FLASHINFER_INCLUDE_DIR, | |
| ], | |
| ).build_and_load() |
|
Hi @yzh119 — thanks for the work here! I only learned about the recent TVM FFI efforts in the last couple of days. TVM FFI is indeed an excellent FFI solution for the ML systems — thanks to you and @tqchen for driving this. From what I see, TVM FFI can nicely decouple flashinfer from PyTorch by providing a framework-agnostic binding layer, which aligns well with the goals of our compatibility approach. This should remove a lot of pain points in our custom C++ operator ecosystem — at minimum we wouldn’t need to worry about C++ ABI/operator registration compatibility anymore. I did a quick look into the implementation and it seems we would likely only need a small adaptation for CUDA stream handling in TVM FFI (see: https://github.com/apache/tvm/blob/a819115375568e52f9d2d7376cdbb0a23346c3cb/ffi/python/tvm_ffi/cython/function.pxi#L110-L124). So I’m looking forward to your refactor. Separately, TVM FFI as a more general, framework-agnostic custom-op solution opens up additional possibilities and could offer more options for our ecosystem compatibility strategy. Do you have any plans to promote or adopt TVM FFI in projects beyond flashinfer? If so, that could help more custom-op projects decouple from the PyTorch ecosystem and move toward a framework-agnostic custom-op ecosystem. @tqchen |
|
thanks @SigureMo , yes, we do plan to bring tvm ffi as an independent project that benefit all, we are still at the bring up stage so didn't communicate broadly, but yes the goal is to make it a general project that can be used across all deep learning frameworks, compilers, and libraries |
@yzh119 Thanks for the work on #1641! I can confirm the C++ layer no longer depends on PyTorch after that change, which removes the adapter maintenance we were carrying on our side—really appreciate it. I did notice the Python JIT workflow still references torch headers and some torch-specific compile flags. flashinfer/flashinfer/jit/cpp_ext.py Lines 98 to 119 in 08b8da3 Do you plan to remove those as well? On the E2E validation: we already landed the Paddle prerequisites (PaddlePaddle/Paddle#75193 and PaddlePaddle/Paddle#75205), so I’m optimistic flashinfer will run on Paddle as smoothly as it does on PyTorch. I’ll run the verification soon—likely right after the holiday. |
Yes most of them are no longer required, updated in #1795 |
<!-- .github/pull_request_template.md --> ## 📌 Description The codegen logic for pytorch and tvm should unify after #1641 , and this PR cleans up the related codegen functions in tvm_bindings. Other changes: 1. update tvm-ffi to 0.1.0b11 to incorporate apache/tvm-ffi#67 and apache/tvm-ffi#68 2. rename of source files: `_ops.cu` and `_pybind.cu` renamed to `_binding.cu` 3. remove torch related header include/library linking in ninja files (#1642 (comment)) 4. remove the use of `use_torch_stream` in unittests, they are no longer required after apache/tvm-ffi#68 ## 🔍 Related Issues #1641 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes cc @MasterJH5574 please let us know what changes do we need to make to help you bump to the latest version of flashinfer in MLC.
We are PaddlePaddle contributors working on a PyTorch compatibility layer aimed at making it significantly easier for PyTorch ecosystem libraries to run on Paddle. See context: #1563
Summary
Design
import paddle as torchand run with minimal or no source changes.paddle.compat.enable_torch_proxy()2 which makesimport torchactually loadpaddle. This removes the need forimport paddle as torchin most cases and keeps changes non-invasive.Usage (example)
Install (build with compatibility enabled)
PADDLE_COMPATIBLE_API=1 pip install -v --no-build-isolation .Runtime example
Why this is opt-in
PADDLE_COMPATIBLE_API. When set, the compatibility hooks and small source adjustments are enabled. This keeps the default behavior unchanged for regular PyTorch or Paddle users.Small changes requested in flashinfer
setup.py/ AOT build: during AOT compilationsetup.pycurrently doesimport torch. For compatibility builds we need the build to performpaddle.compat.enable_torch_proxy()early (beforeimport torch), or otherwise provide a small hook so the build imports loadpaddleinstead.PADDLE_COMPATIBLE_API) inside flashinfer; if present, we enable the compatibility adjustments only in that mode.Would these minimal, environment-gated changes be acceptable to the flashinfer maintainers?
What we tested
flashinfer.fused_moe.cutlass_fused_moeinterface. With the compatibility mode enabled and some additional Python-side compatibility work in progress, we successfully ran fp16 unit tests for that interface.Next steps (proposed)
PADDLE_COMPATIBLE_API=1) and gradually increase coverage.Thank you for reviewing this PR — we welcome your feedback on the minimal integration approach and are ready to iterate on the branch or make any changes you prefer.
Footnotes
https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/phi/api/include/compat ↩
https://github.com/PaddlePaddle/Paddle/blob/b38a9503d4f3f7c84af44a6399bb76ee043e7616/python/paddle/compat.py#L110 ↩